SRIUBC: Simple Similarity Features for Semantic Textual Similarity
نویسندگان
چکیده
We describe the systems submitted by SRI International and the University of the Basque Country for the Semantic Textual Similarity (STS) SemEval-2012 task. Our systems focused on using a simple set of features, featuring a mix of semantic similarity resources, lexical match heuristics, and part of speech (POS) information. We also incorporate precision focused scores over lexical and POS information derived from the BLEU measure, and lexical and POS features computed over split-bigrams from the ROUGE-S measure. These were used to train support vector regressors over the pairs in the training data. From the three systems we submitted, two performed well in the overall ranking, with splitbigrams improving performance over pairs drawn from the MSR Research Video Description Corpus. Our third system maintained three separate regressors, each trained specifically for the STS dataset they were drawn from. It used a multinomial classifier to predict which dataset regressor would be most appropriate to score a given pair, and used it to score that pair. This system underperformed, primarily due to errors in the dataset predictor.
منابع مشابه
SRIUBC-Core: Multiword Soft Similarity Models for Textual Similarity
In this year’s Semantic Textual Similarity evaluation, we explore the contribution of models that provide soft similarity scores across spans of multiple words, over the previous year’s system. To this end, we explored the use of neural probabilistic language models and a TF-IDF weighted variant of Explicit Semantic Analysis. The neural language model systems used vector representations of indi...
متن کاملTATO: Leveraging on Multiple Strategies for Semantic Textual Similarity
In this paper, we describe the TATO system which participated in the SemEval-2015 Task 2a: “Semantic Textual Similarity (STS) for English”. Our system is trained on published datasets from the previous competitions. Based on some machine learning techniques, it combines multiple similarity measures of varying complexity ranging from simple lexical and syntactic similarity measures to complex se...
متن کاملL2F/INESC-ID at SemEval-2017 Tasks 1 and 2: Lexical and semantic features in word and textual similarity
This paper describes our approach to the SemEval-2017 “Semantic Textual Similarity” and “Multilingual Word Similarity” tasks. In the former, we test our approach in both English and Spanish, and use a linguistically-rich set of features. These move from lexical to semantic features. In particular, we try to take advantage of the recent Abstract Meaning Representation and SMATCH measure. Althoug...
متن کاملExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity
We present ExB Themis – a word alignmentbased semantic textual similarity system developed for SemEval-2015 Task 2: Semantic Textual Similarity. It combines both string and semantic similarity measures as well as alignment features using Support Vector Regression. It occupies the first three places on Spanish data and additionally places second on English data. ExB Themis proved to be the best ...
متن کاملUKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Sem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012